GVPT722
Building confidence in your inference from a finite number of random samples.
Because we are working with randomness:
Refers to the probability that each random sample from our population will produce a similar set of estimates of our regression coefficients.
What is the relationship between an individual’s feelings towards President Obama and their party affiliation?
Rows: 5,916
Columns: 3
$ caseid <dbl> 408, 3282, 1942, 118, 5533, 5880, 1651, 6687, 5903, 629, 1…
$ obama_therm <dbl> 15, 100, 70, 30, 70, 45, 50, 60, 15, 100, NA, 0, 45, 30, 4…
$ dem <dbl> 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
What is the relationship between an individual’s feelings towards President Obama and their party affiliation?
| (1) | |
|---|---|
| (Intercept) | 44.245*** |
| Democrat | 41.061*** |
| Num.Obs. | 5474 |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Let’s start with a random sample of 100 respondents:
Rows: 100
Columns: 3
$ caseid <dbl> 405, 5672, 372, 3522, 3882, 1437, 6178, 1148, 5379, 5953, …
$ obama_therm <dbl> 100, 85, 40, 100, 100, 100, NA, 60, 50, 100, 70, 100, NA, …
$ dem <dbl> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0…
| (1) | |
|---|---|
| (Intercept) | 47.268*** |
| Democrat | 44.192*** |
| Num.Obs. | 93 |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Let’s take a different random sample of 100 respondents:
Rows: 100
Columns: 3
$ caseid <dbl> 6111, 148, 1913, 6246, 6640, 3983, 6779, 3650, 1452, 4943,…
$ obama_therm <dbl> 85, 100, 80, NA, 50, 100, NA, 70, 100, 50, 70, 70, 60, 85,…
$ dem <dbl> 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0…
Let’s take a different random sample of 100 respondents:
| (1) | |
|---|---|
| (Intercept) | 39.057*** |
| Democrat | 42.548*** |
| Num.Obs. | 96 |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Let’s take 1,000 different random samples of 100 respondents.
We can only take a finite number of random samples from our population.
We can do this by increasing our sample size. The larger the sample size, the more consistent the estimates.
Let’s look at 1,000 different random samples of 300 respondents.
And 1,000 different random samples of 1,000 respondents.
Do we get more consistent estimates?
A biased coefficient estimate will systematically be higher or lower than the true value.
What happens to our understanding of the relationship between an individual’s feelings towards Obama and their party affiliation?
nes_men <- nes |>
filter(gender == "Male") |>
select(caseid, obama_therm, dem, gender)
glimpse(nes_men)Rows: 2,847
Columns: 4
$ caseid <dbl> 408, 3282, 1942, 118, 5533, 5880, 1651, 6687, 5903, 629, 1…
$ obama_therm <dbl> 15, 100, 70, 30, 70, 45, 50, 60, 15, 100, NA, 0, 45, 30, 4…
$ dem <dbl> 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ gender <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, Male…
Let’s take a random sample of 1,000 individuals from this male-only pool.
| Males only | |
|---|---|
| (Intercept) | 43.575*** |
| Democrat | 41.641*** |
| Num.Obs. | 925 |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
| Males only | All respondents | 'True' relationship | |
|---|---|---|---|
| (Intercept) | 43.575*** | 43.102*** | 44.245*** |
| Democrat | 41.641*** | 41.505*** | 41.061*** |
| Num.Obs. | 925 | 934 | 5474 |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Cannot take infinite samples from our population
Can use our understanding of uncertainty to increase our confidence in a single or finite number of random samples from our population (consistency)
Need to ensure that we are not excluding groups of observations from the population from which we draw those random samples (bias)
We aim to have consistent and unbiased estimates of our coefficients